TakeHomeEx3

Author

Lin Lin

Objective definition:

FishEye International, a non-profit focused on countering illegal, unreported, and unregulated (IUU) fishing, has been given access to an international finance corporation’s database on fishing related companies. In the past, FishEye has determined that companies with anomalous structures are far more likely to be involved in IUU (or other “fishy” business). FishEye has transformed the database into a knowledge graph. It includes information about companies, owners, workers, and financial status. FishEye is aiming to use this graph to identify anomalies that could indicate a company is involved in IUU.

FishEye analysts have attempted to use traditional node-link visualizations and standard graph analyses, but these were found to be ineffective because the scale and detail in the data can obscure a business’s true structure.

The research below aim to help FishEye develop a new visual analytics approach to better understand fishing business anomalies.

We will use visual analytics to understand patterns of groups in the knowledge graph and highlight anomalous groups.

Task 1: Use visual analytics to identify anomalies in the business groups present in the knowledge graph.

Task 2: Develop a visual analytics process to find similar businesses and group them. This analysis should focus on a business’s most important features and present those features clearly to the user.

1. Data Pre-processing and cleaning

Load the library and read the json relationship file MC2.

  • jsonlite: A lightweight R package for working with JSON data, providing functions to convert JSON to R objects and vice versa.

  • tidygraph: A tidyverse package that provides a tidy and consistent approach to working with graph data structures, allowing for easy manipulation, visualization, and analysis of networks.

  • ggraph: An extension of the ggplot2 package that specializes in creating aesthetically pleasing and customizable visualizations of graphs and networks.

  • visNetwork: An R package that utilizes the vis.js library to create interactive network visualizations, allowing for exploration and analysis of complex networks.

  • tidyverse: A collection of R packages, including ggplot2, dplyr, tidyr, and others, designed to provide a cohesive and consistent framework for data manipulation, visualization, and analysis.

  • shiny: An R package for building interactive web applications directly from R code, enabling the creation of user-friendly and responsive data-driven applications.

  • plotly: An R package that provides a high-level interface for creating interactive and dynamic visualizations, allowing users to explore and analyze data through features like hover effects, zooming, and panning.

  • graphlayouts: An R package that offers various algorithms for laying out and visualizing graph structures, providing options for arranging nodes and edges in a visually meaningful way.

  • ggforce: An extension package for ggplot2 that extends its capabilities by introducing new geoms, statistical transformations, and scales, enabling users to create more advanced and specialized plots.

  • tidytext: A tidyverse package that provides tools for text mining and analysis, allowing users to manipulate, explore, and visualize text data using the principles of tidy data.

  • skimr: An R package that provides concise and informative summaries of data frames, providing a quick overview of variables’ distributions, missing values, and other summary statistics.

  #echo | false
  #tidytext -- text mining library with R: https://cran.r-project.org/web/packages/tidytext/vignettes/tidytext.html
  #Load Libraries   
  pacman::p_load(jsonlite,tidygraph, ggraph, visNetwork, tidyverse, shiny, plotly, graphlayouts, ggforce, tidytext,skimr)   
  #load Data   
  MC3<- fromJSON("data/MC3.json")

Data Cleaning for MC3 Nodes and Edges

We picked the desired fields and reorganized the columns using select function. The nodes in MC3 will be companies or person, and description about companies, with their product and services, country and revenue generated.

As we load the data, we found this diagram is not directed, so we will not know the in/out direction of connection.

Below code extract out nodes out for further processing.

  #glimpse(MC3)
  MC3_nodes <- as_tibble(MC3$nodes)
  colSums(is.na(MC3_nodes))
         country               id product_services      revenue_omu 
               0                0                0                0 
            type 
               0 
  #Extract and mutate the format so it's not list but dataframe
  MC3_nodes_clean <- MC3_nodes %>% mutate(country = as.character(country),
                                          id = as.character(id),
                                          product_services = as.character(product_services),
                                          revenue_omu = as.numeric(as.character(revenue_omu)),    #we need to convert to numeric directly
                                          type = as.character(type)) %>%
    select(id, country, type, revenue_omu, product_services)
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `revenue_omu = as.numeric(as.character(revenue_omu))`.
Caused by warning:
! NAs introduced by coercion

The original data do not have NA value, however by transforming data into table format, some fields are NA.

  #check data quality, find missing value
  colSums(is.na(MC3_nodes_clean))
              id          country             type      revenue_omu 
               0                0                0            21515 
product_services 
               0 
  #check which are the types?
  unique(MC3_nodes_clean$type)
[1] "Company"          "Company Contacts" "Beneficial Owner"
  skim(MC3_nodes_clean)
Data summary
Name MC3_nodes_clean
Number of rows 27622
Number of columns 5
_______________________
Column type frequency:
character 4
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
id 0 1 6 64 0 22929 0
country 0 1 2 15 0 100 0
type 0 1 7 16 0 3 0
product_services 0 1 4 1737 0 3244 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
revenue_omu 21515 0.22 1822155 18184433 3652.23 7676.36 16210.68 48327.66 310612303 ▇▁▁▁▁

Out of the total Nodes 21515 out of 27622 rows do not have value for revenue_omu. The ratio of missing value in revenue_omu is 77.9%. We will need to deal with this Missing values. And there are 22929 out of 27622 rows have unique ids, there are duplicates with id. The ratio of non-duplicate id is 83.0%.

Remove duplicates in nodes: If two rows with duplicate id but with different value in any other 4 columns (country, type, revenue_omu, product_services), keep both rows the duplicate id. If the two rows are identical for all columns, we remove the duplicate row.

  #check which are the duplicate ids
  duplicate_ids <- MC3_nodes_clean[duplicated(MC3_nodes_clean$id), "id"]

  #use R base function duplicate to achieve this
  MC3_nodes_clean <- MC3_nodes_clean[!duplicated(MC3_nodes_clean), ]

  DT::datatable(MC3_nodes_clean)
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html

After removing duplicates, around 2000 rows has been removed, out of the total 4693 duplicate ids.

Below code extract out edges out for further processing.

  MC3_edges <- as_tibble(MC3$links) %>% 
  distinct() %>%
  mutate(source = as.character(source),
         target = as.character(target),
         type = as.character(type)) %>%
  group_by(source, target, type) %>%
    summarise(weights = n()) %>%
  filter(source!=target) %>%
  ungroup()
`summarise()` has grouped output by 'source', 'target'. You can override using
the `.groups` argument.
  #check missing value
  colSums(is.na(MC3_edges))
 source  target    type weights 
      0       0       0       0 
  skim(MC3_edges)
Data summary
Name MC3_edges
Number of rows 24036
Number of columns 4
_______________________
Column type frequency:
character 3
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
source 0 1 6 700 0 12856 0
target 0 1 6 28 0 21265 0
type 0 1 16 16 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
weights 0 1 1 0 1 1 1 1 1 ▁▁▇▁▁

There is no missing value in edges data. Explore the dataset.

  #check which are the types?
  unique(MC3_edges$type)
[1] "Company Contacts" "Beneficial Owner"
  MC3_edges_clean <- MC3_edges %>% mutate(source = as.character(source),
                       target = as.character(target),
                       edgeType = as.character(type)) %>%
  group_by(source, target, edgeType) %>%
  summarise(weights = n()) %>%
  filter(source!=target) %>%
  ungroup()
`summarise()` has grouped output by 'source', 'target'. You can override using
the `.groups` argument.
  #datatable() of DT package is used to display mc3_edges tibble data frame as an interactive table on the html document.
  
  DT::datatable(MC3_edges_clean)
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html

What are the types for edge and nodes?

From the exploration above, we know there are 3 node type possible: Company, Company Contacts, Beneficial Owner, and there are 2 edge type possible: Company Contacts, Beneficial Owner

Below plot shows the proportion of each type in node and edge respectively.

In order to find the business group, we will check the type of different category of data. There might be owner - business, customer - business, business - business relationship

  ggplot(data = MC3_nodes_clean,
         aes(x= type)) +
    geom_bar()+
  labs(title = "Node Types Distribution",
       x = "Node Types",
       y = "Count")

  ggplot(data = MC3_edges_clean,
       aes(x = edgeType)) +
    geom_bar() +
  labs(title = "Edge connection Types Distribution",
       x = "Edge Types",
       y = "Count")

Assumption: in this case, we will assume Node type = Company, indicating the node is a legal entity, while node type = Company Contacts, Beneficial Owner, the node is a natural person

Check source and target types mapping

From edge file, first we explore if there’s any mapping information in node for each of the source and target item

  # Count the number of targets in MC3_edges_clean that exist in MC3_nodes_clean
  existing_targets_count <- MC3_edges_clean %>%
  mutate(target = as.character(target)) %>%
  semi_join(select(MC3_nodes_clean, id), by = c("target" = "id")) %>%
  summarise(targets_found = n_distinct(target))
  print(existing_targets_count)
# A tibble: 1 × 1
  targets_found
          <int>
1             0
  # Count the number of sources in MC3_edges_clean that exist in MC3_nodes_clean
  existing_sources_count <- MC3_edges_clean %>%
  mutate(source = as.character(source)) %>%
  semi_join(select(MC3_nodes_clean, id), by = c("source" = "id")) %>%
  summarise(sources_found = n_distinct(source))
  print(existing_sources_count)
# A tibble: 1 × 1
  sources_found
          <int>
1          4880

There were sources id found in node file but no target id was found at all. Looking into the edge data, the targets seems are all person’s name, hence the in the edge table, the edgeType should be the target type.

  # Consolidate node information and add source node types, target node type will not be found from node join. Edge type is treated as the target type
   MC3_edges_clean_Join <- MC3_edges_clean %>%
  left_join(select(MC3_nodes_clean, 
      #%>% filter(type == "Company"),
      id, sourceNodeType = type), by = c("source" = "id")) %>%
  group_by(source, target, sourceNodeType) %>%
  filter(source != target) %>%
  distinct() %>%
  ungroup()
Warning in left_join(., select(MC3_nodes_clean, id, sourceNodeType = type), : Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 7 of `x` matches multiple rows in `y`.
ℹ Row 7497 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
  # Plot the stacked bar chart
  ggplot(data = MC3_edges_clean_Join, aes(x = sourceNodeType, fill = edgeType)) +
  geom_bar() +
  labs(title = "Source Node Types with Breakdown of Target edgeType",
       x = "Source Types",
       y = "Count") +
  scale_fill_discrete(name = "Target Type")+
    coord_flip()

They are some observations of source type from benefit owner/company contacts to target type of benefit owner/company contacts. By right all sources should be company.As the number of none company source are lesser, consider exclude those are company contacts/beneficial owner type.

For source type of company, we may obtain additional company information mainly from Node (Revenue, Product/Services), and people related information can be obtained from Edge (Edge Type). We can use this to derive a new nodes data frame from edges data frame.

3. Derive New Node Data, Building network model with tidygraph

   id1 <- MC3_edges_clean %>%
  select(source) %>%
  rename(id = source)
  id2 <- MC3_edges_clean %>%
    select(target) %>%
    rename(id = target)
  MC3_nodes1 <- rbind(id1, id2) %>%
    distinct() %>%
    left_join(MC3_nodes_clean,
              unmatched = "drop")
Joining with `by = join_by(id)`
  mc3_graph <- tbl_graph(nodes = MC3_nodes1,
                       edges = MC3_edges_clean,
                       directed = FALSE) %>%
  mutate(betweenness_centrality = centrality_betweenness(),
         closeness_centrality = centrality_closeness())
  
  mc3_graph %>%
  filter(betweenness_centrality >= 100000) %>%
  ggraph(layout = "fr") +
    geom_edge_link(aes(alpha=0.5, colour = edgeType)) +
    geom_node_point(aes(
      size = betweenness_centrality,
      colour = type,
      alpha = 0.5)) +
    scale_size_continuous(range=c(1,10))+
    theme_graph()
Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
not found in Windows font database

Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database

4.Consolidate counting information

With the above knowledge graph, we are interested to know 1. companies vs. owner count, for each company, how many owner does it have? 2. companies vs. company contacts, for each company, how many contacts does it have? 3. owners vs. companies, which are the owners that owns multiple companies?

After that we could categorize relationship manually, give them some labels

  # Counting Beneficial Owner and Company Contacts for each company
  company_counts <- MC3_edges_clean %>%
  filter(edgeType %in% c("Beneficial Owner", "Company Contacts")) %>%
  group_by(source, edgeType) %>%
  summarise(count = n()) %>%
  pivot_wider(names_from = edgeType, values_from = count, values_fill = 0)
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.
  # Counting companies owned by each Beneficial Owner
  owner_counts <- MC3_edges_clean %>%
    filter(edgeType == "Beneficial Owner") %>%
    group_by(target) %>%
    summarise(numOfCompanyOwned = n_distinct(source))

  # Update the nodes with the count information
  MC3_nodes_updated <- MC3_nodes_clean %>%
    left_join(company_counts, by = c("id" = "source")) %>%
    left_join(owner_counts, by = c("id" = "target"))

Generate some counts with the records based on relationship observed.

  # Counting Beneficial Owner and Company Contacts for each company
  company_counts1 <- MC3_edges_fishing %>%
  filter(edgeType %in% c("Beneficial Owner", "Company Contacts")) %>%
  group_by(source, edgeType) %>%
  summarise(count = n()) %>%
  pivot_wider(names_from = edgeType, values_from = count, values_fill = 0)%>%
  rename(numOfBenOwner = "Beneficial Owner", numOfComContact = "Company Contacts")
`summarise()` has grouped output by 'source'. You can override using the
`.groups` argument.
  # Counting companies owned by each Beneficial Owner
  owner_counts1 <- MC3_edges_fishing %>%
    filter(edgeType == "Beneficial Owner") %>%
    group_by(target) %>%
    summarise(numOfCompanyOwned = n_distinct(source))

  # Update the nodes with the count information, and take out Undesired companies
    MC3_nodes_fishupdated <- MC3_nodes_fishNetwork2 %>%
    left_join(company_counts1, by = c("id" = "source")) %>%
    left_join(owner_counts1, by = c("id" = "target")) %>%
    distinct(id, .keep_all = TRUE) %>%
    filter(!(type == "Company" & product_services %in% c("Unknown", "character(0)")))

# 
#     visNetwork(MC3_nodes_fishupdated_filtered,
#            MC3_edges_fishing)

4.1 Understanding the company to owner, company to contact, owner to company relationship with the count distribution

  library(patchwork)
  # Calculate average values
  avg_ben_owner <- mean(MC3_nodes_fishupdated$numOfBenOwner, na.rm = TRUE)
  avg_com_contact <- mean(MC3_nodes_fishupdated$numOfComContact, na.rm = TRUE)
  avg_company_owned <- mean(MC3_nodes_fishupdated$numOfCompanyOwned, na.rm = TRUE)
  

  # Create separate histogram plots
 hist1 <- ggplot(MC3_nodes_fishupdated) +
  geom_histogram(aes(x = numOfBenOwner), fill = "skyblue", color = "black", bins = 20) +
  geom_vline(xintercept = avg_ben_owner, color = "red", linetype = "dashed", size = 1) +
  labs(title = "Distribution of number of Beneficial Owners of each company",
       x = "Number of Beneficial Owners",
       y = "Frequency",
       caption = paste("Average owners count:", avg_ben_owner)) +
  theme(plot.caption = element_text(hjust = 0))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
  hist2 <- ggplot(MC3_nodes_fishupdated) +
    geom_histogram(aes(x = numOfComContact), fill = "lightgreen", color = "black", bins = 20) +
    geom_vline(xintercept = avg_com_contact, color = "red", linetype = "dashed", size = 1) +
    labs(title = "Distribution of number of Company Contacts of each company",
         x = "Number of Company Contacts",
         y = "Frequency",
       caption = paste("Average company contract count:", avg_com_contact)) +
  theme(plot.caption = element_text(hjust = 0))
  
  hist3 <- ggplot(MC3_nodes_fishupdated) +
    geom_histogram(aes(x = numOfCompanyOwned), fill = "lightpink", color = "black", bins = 20) +
    geom_vline(xintercept = avg_company_owned, color = "red", linetype = "dashed", size = 1) +
    labs(title = "Distribution of Companies Owned by Beneficial Owner",
         x = "Number of Companies Owned",
         y = "Frequency",
       caption = paste("Average companies owned count:", avg_company_owned)) +
  theme(plot.caption = element_text(hjust = 0))
  
  # Arrange the plots vertically
  hist_combined <- hist1 / hist2 / hist3 +
  plot_layout(nrow = 3)
  
  hist_combined
Warning: Removed 2835 rows containing non-finite values (`stat_bin()`).
Warning: Removed 2835 rows containing non-finite values (`stat_bin()`).
Warning: Removed 1401 rows containing non-finite values (`stat_bin()`).

4.2 Potential anomalities labeling

Next, manually add some label for anomalies with the knowedge from the distribution above.

  # Detect outliers in numOfBenOwner
  outliers_ben_owner <- sort(boxplot.stats(MC3_nodes_fishupdated$numOfBenOwner)$out)
  # Detect outliers in numOfComContact
  outliers_com_contac <- sort(boxplot.stats(MC3_nodes_fishupdated$numOfComContact)$out)
  # Detect outliers in numOfCompanyOwned
  outliers_company_owned <- sort(boxplot.stats(MC3_nodes_fishupdated$numOfCompanyOwned)$out)
  
  # Print the outlier values
  cat("Outliers in numOfBenOwner:", outliers_ben_owner, "\n")
Outliers in numOfBenOwner: 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 10 10 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 13 15 15 15 15 16 16 16 17 18 18 18 20 21 21 21 22 22 22 22 23 24 24 25 26 27 29 29 30 30 32 33 34 36 39 42 47 48 
  cat("Outliers in numOfComContact:", outliers_com_contac, "\n")
Outliers in numOfComContact: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 6 6 6 6 6 6 6 6 8 9 9 9 9 9 15 
  cat("Outliers in numOfCompanyOwned:", outliers_company_owned, "\n")
Outliers in numOfCompanyOwned: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 
  MC3_nodes_fishupdated <- MC3_nodes_fishupdated %>%
  mutate(label = case_when(
    numOfBenOwner >= 8 ~ "Too Many Owners",
    numOfComContact == 0 ~ "No Company Contacts",
    numOfComContact >= 5 ~ "Many Company Contacts",
    numOfCompanyOwned >= 2 ~ "Own more than 1 company",
    TRUE ~ "Normal"
  ))

Next we split the diagram to different view and investigate different abnormalities

First understand a rough spread among different type of nodes with different labels identified

  library(ggrepel)

  count_table <- MC3_nodes_fishupdated %>%
  group_by(type, label) %>%
  summarise(count = n()) %>%
  ungroup()
`summarise()` has grouped output by 'type'. You can override using the
`.groups` argument.
  # Plot the stacked bar chart
  ggplot(data = MC3_nodes_fishupdated, aes(x = type, fill = label)) +
  geom_bar() +
  # geom_text(data = count_table, aes(label = count), vjust = -0.5, color = "black") +
  labs(title = "Source Node Types with Breakdown of different business pattern label",
       x = "Source Types",
       y = "Count") +
  scale_fill_discrete(name = "business pattern label")+
    coord_flip()

With the proportion displayed above, we will zoom into look at Too many Owners, Own more than 1 company, No Company Contacts for knowledge graph.

4.3 ompanies with Too Many Owners

First is about those companies with Too Many Owners (there are at least more than 8 of them)

  MC3_nodes_toomanyowners <- MC3_nodes_fishupdated%>%
  filter(label == "Too Many Owners") 
  
  MC3_edges_toomanyowners <- MC3_edges_fishing %>%
  filter(source %in% MC3_nodes_toomanyowners$id)%>%
    rename(from = source)%>%
    rename(to = target)
  
  idS <- MC3_edges_toomanyowners %>%
  select(from) %>%
  rename(id = from)
  idT <- MC3_edges_toomanyowners %>%
  select(to) %>%
  rename(id = to)
  
  MC3_nodes_toomanyownersView <- rbind(idS, idT) %>%
  left_join(MC3_nodes_fishupdated, by = c("id" = "id"))%>%
  distinct() %>%   #define the type as different group for color
  rename (group = type)
    
  
  visGraph <- visNetwork(MC3_nodes_toomanyownersView,MC3_edges_toomanyowners, width = "100%")%>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visNodes(id = "id", label = "numOfBenOwner") %>%
  visEdges(arrows = 'to') %>%
  visOptions(selectedBy = "group",
             highlightNearest = list(enabled = TRUE,
                                     degree = 1,
                                     hover = TRUE,
                                     labelOnly = TRUE),
             nodesIdSelection = TRUE
             ) %>%
    visInteraction(navigationButtons = TRUE)%>%
  visLegend() %>%
  visLayout(randomSeed = 123)

  
 
  visGraph

4.4 Owners with more than one company

  group_colors <- c("Beneficial Owner" = "lightpink1",
                  "Company" = "cadetblue3",
                  "Company Contacts" = "grey")

  MC3_nodes_abnormalOwner <- MC3_nodes_fishupdated%>%
  filter(label == "Own more than 1 company") 
  
  MC3_edges_abnormalOwner <- MC3_edges_fishing %>%
  filter(target %in% MC3_nodes_abnormalOwner$id)%>%
    rename(from = source)%>%
    rename(to = target)
  
  idS1 <- MC3_edges_abnormalOwner %>%
  select(from) %>%
  rename(id = from)
  idT1 <- MC3_edges_abnormalOwner %>%
  select(to) %>%
  rename(id = to)
  
  MC3_nodes_abnormalOwnerView <- rbind(idS1, idT1) %>%
  left_join(MC3_nodes_fishupdated, by = c("id" = "id"))%>%
  rename (group = type) %>%   #define the type as different group for color
  distinct()
    
  
  visGraph <- visNetwork(MC3_nodes_abnormalOwnerView,MC3_edges_abnormalOwner, width = "100%")%>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visNodes(color = group_colors) %>%
  # visEdges(arrows = 'to') %>%
  visOptions(nodesIdSelection = TRUE,
             selectedBy = "group",
             highlightNearest = list(enabled = TRUE,
                                     degree = 1,
                                     hover = TRUE,
                                     labelOnly = TRUE)
             ) %>%
    visInteraction(navigationButtons = TRUE)%>%
  visLegend() %>%
  visLayout(randomSeed = 123)

  
 
  visGraph
Input to asJSON(keep_vec_names=TRUE) is a named vector. In a future version of jsonlite, this option will not be supported, and named vectors will be translated into arrays instead of objects. If you want JSON object output, please use a named list instead. See ?toJSON.

4.5 Companies with no contacts at all

  MC3_nodes_NoCompanyContacts <- MC3_nodes_fishupdated%>%
  filter(label == "No Company Contacts") 
  
  MC3_edges_NoCompanyContacts <- MC3_edges_fishing %>%
  filter(source %in% MC3_nodes_NoCompanyContacts$id)%>%
    rename(from = source)%>%
    rename(to = target)
  
  idS2 <- MC3_edges_NoCompanyContacts %>%
  select(from) %>%
  rename(id = from)
  idT2 <- MC3_edges_NoCompanyContacts %>%
  select(to) %>%
  rename(id = to)
  
  MC3_nodes_NoCompanyContactsView <- rbind(idS2, idT2) %>%
  left_join(MC3_nodes_fishupdated, by = c("id" = "id"))%>%
  rename (group = type) %>%   #define the type as different group for color
  distinct()
    
  
  visGraph <- visNetwork(MC3_nodes_NoCompanyContactsView,MC3_edges_NoCompanyContacts, width = "100%")%>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  # visNodes(color = group_colors) %>%
  visEdges(arrows = 'to') %>%
  visOptions(nodesIdSelection = TRUE,
             selectedBy = "group",
             highlightNearest = list(enabled = TRUE,
                                     degree = 1,
                                     hover = TRUE,
                                     labelOnly = TRUE)
             ) %>%
    visInteraction(navigationButtons = TRUE)%>%
  visLegend() %>%
  visLayout(randomSeed = 123)

  
 
  visGraph